Search CORE

53 research outputs found

Towards Structural Classification of Proteins based on Contact Map Overlap

Author: Nicola Yanev
Nicola Yanev
Noël Malod-dognin
Noël Malod-dognin
Projets Symbiose
Rumen Andonov
Rumen Andonov
Thème Bio
Publication venue
Publication date: 01/01/2007
Field of study

A multitude of measures have been proposed to quantify the similarity between protein 3-D structure. Among these measures, contact map overlap (CMO) maximization deserved sustained attention during past decade because it offers a fine estimation of the natural homology relation between proteins. Despite this large involvement of the bioinformatics and computer science community, the performance of known algorithms remains modest. Due to the complexity of the problem, they got stuck on relatively small instances and are not applicable for large scale comparison. This paper offers a clear improvement over past methods in this respect. We present a new integer programming model for CMO and propose an exact B &B algorithm with bounds computed by solving Lagrangian relaxation. The efficiency of the approach is demonstrated on a popular small benchmark (Skolnick set, 40 domains). On this set our algorithm significantly outperforms the best existing exact algorithms, and yet provides lower and upper bounds of better quality. Some hard CMO instances have been solved for the first time and within reasonable time limits. From the values of the running time and the relative gap (relative difference between upper and lower bounds), we obtained the right classification for this test. These encouraging result led us to design a harder benchmark to better assess the classification capability of our approach. We constructed a large scale set of 300 protein domains (a subset of ASTRAL database) that we have called Proteus 300. Using the relative gap of any of the 44850 couples as a similarity measure, we obtained a classification in very good agreement with SCOP. Our algorithm provides thus a powerful classification tool for large structure databases

arXiv.org e-Print Archive

HAL-CentraleSupelec

CiteSeerX

INRIA a CCSD electronic archive server

HAL-Rennes 1

Solving Maximum Clique Problem for Protein Structure Similarity

Author: Andonov Rumen
Malod-Dognin Noël
Yanev Nicola
Publication venue
Publication date: 01/01/2009
Field of study

A basic assumption of molecular biology is that proteins sharing close three-dimensional (3D) structures are likely to share a common function and in most cases derive from a same ancestor. Computing the similarity between two protein structures is therefore a crucial task and has been extensively investigated. Evaluating the similarity of two proteins can be done by finding an optimal one-to-one matching between their components, which is equivalent to identifying a maximum weighted clique in a specific "alignment graph". In this paper we present a new integer programming formulation for solving such clique problems. The model has been implemented using the ILOG CPLEX Callable Library. In addition, we designed a dedicated branch and bound algorithm for solving the maximum cardinality clique problem. Both approaches have been integrated in VAST (Vector Alignment Search Tool) - a software for aligning protein 3D structures largely used in NCBI (National Center for Biotechnology Information). The original VAST clique solver uses the well known Bron and Kerbosh algorithm (BK). Our computational results on real life protein alignment instances show that our branch and bound algorithm is up to 116 times faster than BK for the largest proteins

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Bulgarian Digital Mathematics Library at IMI-BAS

HAL-Rennes 1

Functional geometry of protein interactomes

Author: Malod Dognin Noël
Pržulj Nataša
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/10/2019
Field of study

Motivation Protein–protein interactions (PPIs) are usually modeled as networks. These networks have extensively been studied using graphlets, small induced subgraphs capturing the local wiring patterns around nodes in networks. They revealed that proteins involved in similar functions tend to be similarly wired. However, such simple models can only represent pairwise relationships and cannot fully capture the higher-order organization of protein interactomes, including protein complexes. Results To model the multi-scale organization of these complex biological systems, we utilize simplicial complexes from computational geometry. The question is how to mine these new representations of protein interactomes to reveal additional biological information. To address this, we define simplets, a generalization of graphlets to simplicial complexes. By using simplets, we define a sensitive measure of similarity between simplicial complex representations that allows for clustering them according to their data types better than clustering them by using other state-of-the-art measures, e.g. spectral distance, or facet distribution distance. We model human and baker’s yeast protein interactomes as simplicial complexes that capture PPIs and protein complexes as simplices. On these models, we show that our newly introduced simplet-based methods cluster proteins by function better than the clustering methods that use the standard PPI networks, uncovering the new underlying functional organization of the cell. We demonstrate the existence of the functional geometry in the protein interactome data and the superiority of our simplet-based methods to effectively mine for new biological information hidden in the complexity of the higher-order organization of protein interactomes.This work was supported by the European Research Council (ERC) Starting Independent Researcher Grant 278212, the European Research Council (ERC) Consolidator Grant 770827, the Serbian Ministry of Education and Science Project III44006, the Slovenian Research Agency project J1-8155 and the awards to establish the Farr Institute of Health Informatics Research, London, from the Medical Research Council, Arthritis Research UK, British Heart Foundation, Cancer Research UK, Chief Scientist Office, Economic and Social Research Council, Engineering and Physical Sciences Research Council, National Institute for Health Research, National Institute for Social Care and Health Research, and Wellcome Trust (grant MR/K006584/1).Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Shape Matching by Localized Calculations of Quasi-isometric Subsets, with Applications to the Comparison of Protein Binding Patches

Author: Cazals Frédéric
Malod-Dognin Noël
Publication venue: Springer Berlin / Heidelberg
Publication date: 01/01/2011
Field of study

International audienceGiven a protein complex involving two partners, the receptor and the ligand, this paper addresses the problem of comparing their binding patches, i.e. the sets of atoms accounting for their interaction. This problem has been classically addressed by searching quasi-isometric subsets of atoms within the patches, a task equivalent to a maximum clique problem, a NP-hard problem, so that practical binding patches involving up to 300 atoms cannot be handled. We extend previous work in two directions. First, we present a generic encoding of shapes represented as cell complexes. We partition a shape into concentric shells, based on the shelling order of the cells of the complex. The shelling order yields a shelling tree encoding the geometry and the topology of the shape. Second, for the particular case of cell complexes representing protein binding patches, we present three novel shape comparison algorithms. These algorithms combine a Tree Edit Distance calculation (TED) on shelling trees, together with Edit operations respectively favoring a topological or a geometric comparison of the patches. We show in particular that the geometric TED calculation strikes a balance, in terms of accuracy and running time between a purely geometric and topological comparisons, and we briefly comment on the biological findings reported in a companion paper.Étant donné un complexe protéique impliquant deux partenaires, un récepteur et un ligand, ce papier étudie le problème de comparer leur patchs de liaison, i.e. les ensembles d'atomes participant à leur interaction. Ce problème est classiquement formulé comme une recherche de sous-ensembles d'atomes quasi-isométriques entre les deux patchs, une tâche qui est équivalente à une recherche de cliques maximums. Ce problème étant NP-difficile, des patchs de liaison impliquant plus de 300 atomes ne peuvent-être traités. Nous étendons les travaux précédant dans deux directions. Premièrement, nous présentons un encodage générique pour les formes représentées par des complexes cellulaires. Nous partitionnons une forme en couches concentriques, basées sur ''l'ordre de couche'' des cellules du complexe. L'ordre des couches produisant un arbre de couches qui encode la géométrie et la topologie de la forme. Deuxièmement, pour le cas particulier de complexes cellulaires représentant des patchs de liaison de complexes protéiques, nous proposons trois algorithmes de comparaison de formes. Ces algorithmes combinent une distance d'édition d'arbre (TED, pour tree-edit-distance) sur les arbres de couches, avec des opérations d'éditions favorisant respectivement la comparaison topologique ou géométrique des patchs. Nous montrons en particulier que la TED géométrique établit un équilibre, en termes de précision et de temps de calculs, entre des comparaisons purement géométriques ou purement topologiques, et nous commentons brièvement les résultats biologiques qui sont détaillés dans un article compagnon

INRIA a CCSD electronic archive server

UCL Discovery

Comparing Protein 3D Structures Using A_purva

Author: Andonov Rumen
Malod-Dognin Noël
Yanev Nicola
Publication venue: HAL CCSD
Publication date: 25/11/2010
Field of study

Structural similarity between proteins provides significant insights about their functions. Maximum Contact Map Overlap maximization (CMO) received sustained attention during the past decade and can be considered today as a credible protein structure measure. We present here A_purva, an exact CMO solver that is both efficient (notably faster than the previous exact algorithms), and reliable (providing accurate upper and lower bounds of the solution). These properties make it applicable for large-scale protein comparison and classification. Availability: http://apurva.genouest.org Contact: [email protected] Supplementary information: A_purva's user manual, as well as many examples of protein contact maps can be found on A_purva's web-page.La similarité structurale entre protéines donne des renseignements importants sur leurs fonctions. La maximisation du recouvrement de cartes de contacts (CMO) a reçu une attention soutenue ces dix dernières années, et est maintenant considérée comme une mesure de similarité crédible. Nous présentons içi A_purva, un solveur de CMO exacte qui est à la fois efficace (plus rapide que les autres algorithmes exactes) et fiable (fournit des bornes supérieures et inférieures précises de la solution). Ces propriétés le rendent applicable pour des comparaisons et des classifications de protéines à grandes échelles. Disponibilité : http://apurva.genouest.org Contact : [email protected] Informations supplémentaires : Le manuel utilisateur d'A_purva, ainsi que de nombreux exemples de cartes de contacts de protéines sont disponibles sur le site web d'A_purva

INRIA a CCSD electronic archive server

Identifying cellular cancer mechanisms through pathway-driven data integration

Author: Malod Dognin Noël
Przulj Natasa
Windels Sam F L
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2022
Field of study

Abstract Motivation Cancer is a genetic disease in which accumulated mutations of driver genes induce a functional reorganization of the cell by reprogramming cellular pathways. Current approaches identify cancer pathways as those most internally perturbed by gene expression changes. However, driver genes characteristically perform hub roles between pathways. Therefore, we hypothesize that cancer pathways should be identified by changes in their pathway–pathway relationships. Results To learn an embedding space that captures the relationships between pathways in a healthy cell, we propose pathway-driven non-negative matrix tri-factorization. In this space, we determine condition-specific (i.e. diseased and healthy) embeddings of pathways and genes. Based on these embeddings, we define our ‘NMTF centrality’ to measure a pathway’s or gene’s functional importance, and our ‘moving distance’, to measure the change in its functional relationships. We combine both measures to predict 15 genes and pathways involved in four major cancers, predicting 60 gene–cancer associations in total, covering 28 unique genes. To further exploit driver genes’ tendency to perform hub roles, we model our network data using graphlet adjacency, which considers nodes adjacent if their interaction patterns form specific shapes (e.g. paths or triangles). We find that the predicted genes rewire pathway–pathway interactions in the immune system and provide literary evidence that many are druggable (15/28) and implicated in the associated cancers (47/60). We predict six druggable cancer-specific drug targets.This work was supported by the European Research Council (ERC) Consolidator Grant 770827 and the Spanish State Research Agency AEI 10.13039/501100011033 [grant number PID2019-105500GB-I00].Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

PubMed Central

Graphlet eigencentralities capture novel central roles of genes in pathways

Author: Malod Dognin Noël
Przulj Natasa
Windels Sam F. L.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2022
Field of study

Motivation Graphlet adjacency extends regular node adjacency in a network by considering a pair of nodes being adjacent if they participate in a given graphlet (small, connected, induced subgraph). Graphlet adjacencies captured by different graphlets were shown to contain complementary biological functions and cancer mechanisms. To further investigate the relationships between the topological features of genes participating in molecular networks, as captured by graphlet adjacencies, and their biological functions, we build more descriptive pathway-based approaches. Contribution We introduce a new graphlet-based definition of eigencentrality of genes in a pathway, graphlet eigencentrality, to identify pathways and cancer mechanisms described by a given graphlet adjacency. We compute the centrality of genes in a pathway either from the local perspective of the pathway or from the global perspective of the entire network. Results We show that in molecular networks of human and yeast, different local graphlet adjacencies describe different pathways (i.e., all the genes that are functionally important in a pathway are also considered topologically important by their local graphlet eigencentrality). Pathways described by the same graphlet adjacency are functionally similar, suggesting that each graphlet adjacency captures different pathway topology and function relationships. Additionally, we show that different graphlet eigencentralities describe different cancer driver genes that play central roles in pathways, or in the crosstalk between them (i.e. we can predict cancer driver genes participating in a pathway by their local or global graphlet eigencentrality). This result suggests that by considering different graphlet eigencentralities, we can capture different functional roles of genes in and between pathwaysThis study received support from the following sources: The European Research Council (ERC) Consolidator Grant 770827 (awarded to NP); The Spanish State Research Agency AEI 10.13039/501100011033 grant number PID2019-105500GB-I00 (awarded to NP); and University College London Computer Science (awarded to SW). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

PubMed Central

Characterizing the Morphology of Protein Binding Patches

Author: Bansal Achin
Cazals Frédéric
Malod-Dognin Noël
Publication venue: 'Wiley'
Publication date: 26/09/2011
Field of study

International audienceLet the patch of a partner in a protein complex be the collection of atoms accounting for the interaction. To improve our understanding of the structure-function relationship, we present a patch model decoupling the topological and geometric properties. While the geometry is classically encoded by the atomic positions, the topology is recorded in a graph encoding the relative position of concentric shells partitioning the interface atoms. The topological-geometric duality provides the basis of a generic dynamic programming-based algorithm comparing patches at the shell level, which may favor topological or geometric features. On the biological side, we address four questions, using 249 cocrystallized heterodimers organized in biological families. First, we dissect the morphology of binding patches and show that Nature enjoyed the topological and geometric degrees of freedom independently while retaining a finite set of qualitatively distinct topological signatures. Second, we argue that our shell-based comparison is effective to perform atomic-level comparisons and show that topological similarity is a less stringent than geometric similarity. We also use the topological versus geometric duality to exhibit topo-rigid patches, whose topology (but not geometry) remains stable upon docking. Third, we use our comparison algorithms to infer specificity-related information amidst a database of complexes. Finally, we exhibit a descriptor outperforming its contenders to predict the binding affinities of the affinity benchmark. The softwares developed with this article are available from http://team.inria.fr/abs/vorpatch_compatch/

INRIA a CCSD electronic archive server

UCL Discovery

Dspace at IIT Bombay

A functional analysis of omic network embedding spaces reveals key altered functions in cancer

Author: Ceddia Gaia
Doria Belenguer Sergio
Malod Dognin Noël
Pržulj Nataša
Xenos Alexandros
Publication venue: Oxford University Press
Publication date: 01/01/2023
Field of study

Abstract Motivation Advances in omics technologies have revolutionized cancer research by producing massive datasets. Common approaches to deciphering these complex data are by embedding algorithms of molecular interaction networks. These algorithms find a low-dimensional space in which similarities between the network nodes are best preserved. Currently available embedding approaches mine the gene embeddings directly to uncover new cancer-related knowledge. However, these gene-centric approaches produce incomplete knowledge, since they do not account for the functional implications of genomic alterations. We propose a new, function-centric perspective and approach, to complement the knowledge obtained from omic data. Results We introduce our Functional Mapping Matrix (FMM) to explore the functional organization of different tissue-specific and species-specific embedding spaces generated by a Non-negative Matrix Tri-Factorization algorithm. Also, we use our FMM to define the optimal dimensionality of these molecular interaction network embedding spaces. For this optimal dimensionality, we compare the FMMs of the most prevalent cancers in human to FMMs of their corresponding control tissues. We find that cancer alters the positions in the embedding space of cancer-related functions, while it keeps the positions of the noncancer-related ones. We exploit this spacial ‘movement’ to predict novel cancer-related functions. Finally, we predict novel cancer-related genes that the currently available methods for gene-centric analyses cannot identify; we validate these predictions by literature curation and retrospective analyses of patient survival data.This project has received funding from the European Research Council (ERC) Consolidator Grant 770827 and the Spanish State Research Agency AEI 10.13039/501100011033 grant number PID2019-105500GB-I00.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC